PPM compression without escapes
نویسنده
چکیده
A significant cost in PPM data compression (and often the major cost) is the provision and efficient coding of escapes while building contexts. This paper presents some recent work on eliminating escapes in PPM compression, using bit-wise compression with binary contexts. It shows that PPM without escapes can achieve averages of 2.5 bits per character on the Calgary Corpus and 2.2 bpc on the Canterbury Corpus, both values comparing well with accepted good compressors.
منابع مشابه
Native Language Identification with PPM
This paper reports on our work in the NLI shared task 2013 on Native Language Identification. The task is to automatically detect the native language of the TOEFL essays authors in a set of given test documents in English. The task was solved by a system that used the PPM compression algorithm based on an n-gram statistical model. We submitted four runs; word-based PPMC algorithm with normaliza...
متن کاملEnsemble Prediction by Partial Matching
Prediction by Partial Matching (PPM) is a lossless compression algorithm which consistently performs well on text compression benchmarks. This paper introduces a new PPM implementation called PPM-Ens which uses unbounded context lengths and ensemble voting to combine multiple contexts. The algorithm is evaluated on the Calgary corpus. The results indicate that combining multiple contexts leads ...
متن کاملPPMexe: PPM for Code Compression
With the emergence of software delivery platforms such as Microsoft’s .NET, code compression has become one of the core enabling technologies strongly affecting system performance. In this paper, we present compression mechanisms for executables that explore their syntax and semantics to achieve superior compression rates. The fundament of our compression codec is the generic paradigm of predic...
متن کاملGeneric Adaptive Syntax-Directed Compression for Mobile Code
We propose a new scheme for compressing mobile programs. Our proposal is meant as part of a larger infrastructure for code distribution and deployment. In this paper we show how to effectively compress programs on the source level by compressing abstract syntax trees (ASTs) which are equivalent to source code (modulo comments and layout). We compress ASTs by adapting the wellknown PPM (predicti...
متن کاملLIPT: A Reversible Lossless Text Transform to Improve Compression Performance
Lossless compression researchers have developed highly sophisticated approaches, such as Huffman encoding, arithmetic encoding, the Lempel-Ziv family, Dynamic Markov Compression (DMC), Prediction by Partial Matching (PPM), and Burrows-Wheeler Transform (BWT) based algorithms. We propose an alternative approach in this paper to develop a reversible transformation that can be applied to a source ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Softw., Pract. Exper.
دوره 42 شماره
صفحات -
تاریخ انتشار 2012